{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "0aae984d",
   "metadata": {},
   "source": [
    "# Python进阶-ML模型应用2_MultipleLinear Regression多元线性回归"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "065df589",
   "metadata": {},
   "source": [
    "Linear Regression线性回归公式：\n",
    "\n",
    "y=p+p<sub>1</sub> X<sub>1</sub> \n",
    "\n",
    "MultipleLinear Regression多元线性回归公式：\n",
    "\n",
    "y=p+p<sub>1</sub>X<sub>1</sub>+p<sub>2</sub>X<sub>2</sub>+p<sub>3</sub>X<sub>3</sub>+...+p<sub>n</sub>X<sub>n</sub> \n",
    "\n",
    "多元线性回归是一种用于评估一个因变量与多个自变量之间关系的方法。如果因变量的变化受到两个或两个以上重要因素的影响，就需要采用多元线性回归进行分析。其基本原理和计算过程与一元线性回归相同，但由于自变量个数多，计算过程相当复杂，通常需要借助统计软件进行运算。\n",
    "\n",
    "在多元线性回归模型中，设y为因变量，X1, X2…Xk为自变量，且自变量与因变量之间存在线性关系。此时，多元线性回归模型可以表示为：Y=b0+b1x1+…+bkxk+e。其中，b0是常数项，b1, b2…bk是回归系数。具体来说，b1代表当X1, X2…Xk固定时，x1每增加一个单位对y的影响，即x1对y的偏回归系数；类似地，b2表示当X1, X2…Xk固定时，x2每增加一个单位对y的影响，即x2对y的偏回归系数。\n",
    "\n",
    "需要注意的是，进行多元线性回归分析时需要满足一些条件，如一元线性回归、多个自变量之间不存在多重共线、正态性、独立性、方差齐性等。在实际运用中，可以通过因素分析和回归诊断来进一步解释和预测因变量的变化。\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "2e3d9cdb",
   "metadata": {},
   "source": [
    "## 应用案例（一）美国初创公司盈利数据分析"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ffa368d7",
   "metadata": {},
   "source": [
    "### 1、导入函数库"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 129,
   "id": "38f9a3d3",
   "metadata": {},
   "outputs": [],
   "source": [
    "#数据处理\n",
    "import numpy as np\n",
    "import pandas as pd\n",
    "#绘图\n",
    "import matplotlib.pyplot as plt\n",
    "import seaborn as sns"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "5254229c",
   "metadata": {},
   "source": [
    "### 2、加载数据集"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 130,
   "id": "625a8716",
   "metadata": {},
   "outputs": [],
   "source": [
    "#https://www.kaggle.com/farhanmd29/50-startups\n",
    "dataset = pd.read_csv('./kaggle-data/startups.csv')"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "0df624ff",
   "metadata": {},
   "source": [
    "### 3、数据分析"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 131,
   "id": "8213d83a",
   "metadata": {
    "scrolled": true
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>R&amp;D Spend</th>\n",
       "      <th>Administration</th>\n",
       "      <th>Marketing Spend</th>\n",
       "      <th>State</th>\n",
       "      <th>Profit</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>165349.20</td>\n",
       "      <td>136897.80</td>\n",
       "      <td>471784.10</td>\n",
       "      <td>New York</td>\n",
       "      <td>192261.83</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>162597.70</td>\n",
       "      <td>151377.59</td>\n",
       "      <td>443898.53</td>\n",
       "      <td>California</td>\n",
       "      <td>191792.06</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>153441.51</td>\n",
       "      <td>101145.55</td>\n",
       "      <td>407934.54</td>\n",
       "      <td>Florida</td>\n",
       "      <td>191050.39</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>144372.41</td>\n",
       "      <td>118671.85</td>\n",
       "      <td>383199.62</td>\n",
       "      <td>New York</td>\n",
       "      <td>182901.99</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>142107.34</td>\n",
       "      <td>91391.77</td>\n",
       "      <td>366168.42</td>\n",
       "      <td>Florida</td>\n",
       "      <td>166187.94</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>131876.90</td>\n",
       "      <td>99814.71</td>\n",
       "      <td>362861.36</td>\n",
       "      <td>New York</td>\n",
       "      <td>156991.12</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>134615.46</td>\n",
       "      <td>147198.87</td>\n",
       "      <td>127716.82</td>\n",
       "      <td>California</td>\n",
       "      <td>156122.51</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>130298.13</td>\n",
       "      <td>145530.06</td>\n",
       "      <td>323876.68</td>\n",
       "      <td>Florida</td>\n",
       "      <td>155752.60</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>120542.52</td>\n",
       "      <td>148718.95</td>\n",
       "      <td>311613.29</td>\n",
       "      <td>New York</td>\n",
       "      <td>152211.77</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>9</th>\n",
       "      <td>123334.88</td>\n",
       "      <td>108679.17</td>\n",
       "      <td>304981.62</td>\n",
       "      <td>California</td>\n",
       "      <td>149759.96</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>10</th>\n",
       "      <td>101913.08</td>\n",
       "      <td>110594.11</td>\n",
       "      <td>229160.95</td>\n",
       "      <td>Florida</td>\n",
       "      <td>146121.95</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>11</th>\n",
       "      <td>100671.96</td>\n",
       "      <td>91790.61</td>\n",
       "      <td>249744.55</td>\n",
       "      <td>California</td>\n",
       "      <td>144259.40</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>12</th>\n",
       "      <td>93863.75</td>\n",
       "      <td>127320.38</td>\n",
       "      <td>249839.44</td>\n",
       "      <td>Florida</td>\n",
       "      <td>141585.52</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>13</th>\n",
       "      <td>91992.39</td>\n",
       "      <td>135495.07</td>\n",
       "      <td>252664.93</td>\n",
       "      <td>California</td>\n",
       "      <td>134307.35</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>14</th>\n",
       "      <td>119943.24</td>\n",
       "      <td>156547.42</td>\n",
       "      <td>256512.92</td>\n",
       "      <td>Florida</td>\n",
       "      <td>132602.65</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>15</th>\n",
       "      <td>114523.61</td>\n",
       "      <td>122616.84</td>\n",
       "      <td>261776.23</td>\n",
       "      <td>New York</td>\n",
       "      <td>129917.04</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>16</th>\n",
       "      <td>78013.11</td>\n",
       "      <td>121597.55</td>\n",
       "      <td>264346.06</td>\n",
       "      <td>California</td>\n",
       "      <td>126992.93</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>17</th>\n",
       "      <td>94657.16</td>\n",
       "      <td>145077.58</td>\n",
       "      <td>282574.31</td>\n",
       "      <td>New York</td>\n",
       "      <td>125370.37</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>18</th>\n",
       "      <td>91749.16</td>\n",
       "      <td>114175.79</td>\n",
       "      <td>294919.57</td>\n",
       "      <td>Florida</td>\n",
       "      <td>124266.90</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>19</th>\n",
       "      <td>86419.70</td>\n",
       "      <td>153514.11</td>\n",
       "      <td>0.00</td>\n",
       "      <td>New York</td>\n",
       "      <td>122776.86</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>20</th>\n",
       "      <td>76253.86</td>\n",
       "      <td>113867.30</td>\n",
       "      <td>298664.47</td>\n",
       "      <td>California</td>\n",
       "      <td>118474.03</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>21</th>\n",
       "      <td>78389.47</td>\n",
       "      <td>153773.43</td>\n",
       "      <td>299737.29</td>\n",
       "      <td>New York</td>\n",
       "      <td>111313.02</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>22</th>\n",
       "      <td>73994.56</td>\n",
       "      <td>122782.75</td>\n",
       "      <td>303319.26</td>\n",
       "      <td>Florida</td>\n",
       "      <td>110352.25</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>23</th>\n",
       "      <td>67532.53</td>\n",
       "      <td>105751.03</td>\n",
       "      <td>304768.73</td>\n",
       "      <td>Florida</td>\n",
       "      <td>108733.99</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>24</th>\n",
       "      <td>77044.01</td>\n",
       "      <td>99281.34</td>\n",
       "      <td>140574.81</td>\n",
       "      <td>New York</td>\n",
       "      <td>108552.04</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>25</th>\n",
       "      <td>64664.71</td>\n",
       "      <td>139553.16</td>\n",
       "      <td>137962.62</td>\n",
       "      <td>California</td>\n",
       "      <td>107404.34</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>26</th>\n",
       "      <td>75328.87</td>\n",
       "      <td>144135.98</td>\n",
       "      <td>134050.07</td>\n",
       "      <td>Florida</td>\n",
       "      <td>105733.54</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>27</th>\n",
       "      <td>72107.60</td>\n",
       "      <td>127864.55</td>\n",
       "      <td>353183.81</td>\n",
       "      <td>New York</td>\n",
       "      <td>105008.31</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>28</th>\n",
       "      <td>66051.52</td>\n",
       "      <td>182645.56</td>\n",
       "      <td>118148.20</td>\n",
       "      <td>Florida</td>\n",
       "      <td>103282.38</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>29</th>\n",
       "      <td>65605.48</td>\n",
       "      <td>153032.06</td>\n",
       "      <td>107138.38</td>\n",
       "      <td>New York</td>\n",
       "      <td>101004.64</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>30</th>\n",
       "      <td>61994.48</td>\n",
       "      <td>115641.28</td>\n",
       "      <td>91131.24</td>\n",
       "      <td>Florida</td>\n",
       "      <td>99937.59</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>31</th>\n",
       "      <td>61136.38</td>\n",
       "      <td>152701.92</td>\n",
       "      <td>88218.23</td>\n",
       "      <td>New York</td>\n",
       "      <td>97483.56</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>32</th>\n",
       "      <td>63408.86</td>\n",
       "      <td>129219.61</td>\n",
       "      <td>46085.25</td>\n",
       "      <td>California</td>\n",
       "      <td>97427.84</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>33</th>\n",
       "      <td>55493.95</td>\n",
       "      <td>103057.49</td>\n",
       "      <td>214634.81</td>\n",
       "      <td>Florida</td>\n",
       "      <td>96778.92</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>34</th>\n",
       "      <td>46426.07</td>\n",
       "      <td>157693.92</td>\n",
       "      <td>210797.67</td>\n",
       "      <td>California</td>\n",
       "      <td>96712.80</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>35</th>\n",
       "      <td>46014.02</td>\n",
       "      <td>85047.44</td>\n",
       "      <td>205517.64</td>\n",
       "      <td>New York</td>\n",
       "      <td>96479.51</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>36</th>\n",
       "      <td>28663.76</td>\n",
       "      <td>127056.21</td>\n",
       "      <td>201126.82</td>\n",
       "      <td>Florida</td>\n",
       "      <td>90708.19</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>37</th>\n",
       "      <td>44069.95</td>\n",
       "      <td>51283.14</td>\n",
       "      <td>197029.42</td>\n",
       "      <td>California</td>\n",
       "      <td>89949.14</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>38</th>\n",
       "      <td>20229.59</td>\n",
       "      <td>65947.93</td>\n",
       "      <td>185265.10</td>\n",
       "      <td>New York</td>\n",
       "      <td>81229.06</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>39</th>\n",
       "      <td>38558.51</td>\n",
       "      <td>82982.09</td>\n",
       "      <td>174999.30</td>\n",
       "      <td>California</td>\n",
       "      <td>81005.76</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>40</th>\n",
       "      <td>28754.33</td>\n",
       "      <td>118546.05</td>\n",
       "      <td>172795.67</td>\n",
       "      <td>California</td>\n",
       "      <td>78239.91</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>41</th>\n",
       "      <td>27892.92</td>\n",
       "      <td>84710.77</td>\n",
       "      <td>164470.71</td>\n",
       "      <td>Florida</td>\n",
       "      <td>77798.83</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>42</th>\n",
       "      <td>23640.93</td>\n",
       "      <td>96189.63</td>\n",
       "      <td>148001.11</td>\n",
       "      <td>California</td>\n",
       "      <td>71498.49</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>43</th>\n",
       "      <td>15505.73</td>\n",
       "      <td>127382.30</td>\n",
       "      <td>35534.17</td>\n",
       "      <td>New York</td>\n",
       "      <td>69758.98</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>44</th>\n",
       "      <td>22177.74</td>\n",
       "      <td>154806.14</td>\n",
       "      <td>28334.72</td>\n",
       "      <td>California</td>\n",
       "      <td>65200.33</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>45</th>\n",
       "      <td>1000.23</td>\n",
       "      <td>124153.04</td>\n",
       "      <td>1903.93</td>\n",
       "      <td>New York</td>\n",
       "      <td>64926.08</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>46</th>\n",
       "      <td>1315.46</td>\n",
       "      <td>115816.21</td>\n",
       "      <td>297114.46</td>\n",
       "      <td>Florida</td>\n",
       "      <td>49490.75</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>47</th>\n",
       "      <td>0.00</td>\n",
       "      <td>135426.92</td>\n",
       "      <td>0.00</td>\n",
       "      <td>California</td>\n",
       "      <td>42559.73</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>48</th>\n",
       "      <td>542.05</td>\n",
       "      <td>51743.15</td>\n",
       "      <td>0.00</td>\n",
       "      <td>New York</td>\n",
       "      <td>35673.41</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>49</th>\n",
       "      <td>0.00</td>\n",
       "      <td>116983.80</td>\n",
       "      <td>45173.06</td>\n",
       "      <td>California</td>\n",
       "      <td>14681.40</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "    R&D Spend  Administration  Marketing Spend       State     Profit\n",
       "0   165349.20       136897.80        471784.10    New York  192261.83\n",
       "1   162597.70       151377.59        443898.53  California  191792.06\n",
       "2   153441.51       101145.55        407934.54     Florida  191050.39\n",
       "3   144372.41       118671.85        383199.62    New York  182901.99\n",
       "4   142107.34        91391.77        366168.42     Florida  166187.94\n",
       "5   131876.90        99814.71        362861.36    New York  156991.12\n",
       "6   134615.46       147198.87        127716.82  California  156122.51\n",
       "7   130298.13       145530.06        323876.68     Florida  155752.60\n",
       "8   120542.52       148718.95        311613.29    New York  152211.77\n",
       "9   123334.88       108679.17        304981.62  California  149759.96\n",
       "10  101913.08       110594.11        229160.95     Florida  146121.95\n",
       "11  100671.96        91790.61        249744.55  California  144259.40\n",
       "12   93863.75       127320.38        249839.44     Florida  141585.52\n",
       "13   91992.39       135495.07        252664.93  California  134307.35\n",
       "14  119943.24       156547.42        256512.92     Florida  132602.65\n",
       "15  114523.61       122616.84        261776.23    New York  129917.04\n",
       "16   78013.11       121597.55        264346.06  California  126992.93\n",
       "17   94657.16       145077.58        282574.31    New York  125370.37\n",
       "18   91749.16       114175.79        294919.57     Florida  124266.90\n",
       "19   86419.70       153514.11             0.00    New York  122776.86\n",
       "20   76253.86       113867.30        298664.47  California  118474.03\n",
       "21   78389.47       153773.43        299737.29    New York  111313.02\n",
       "22   73994.56       122782.75        303319.26     Florida  110352.25\n",
       "23   67532.53       105751.03        304768.73     Florida  108733.99\n",
       "24   77044.01        99281.34        140574.81    New York  108552.04\n",
       "25   64664.71       139553.16        137962.62  California  107404.34\n",
       "26   75328.87       144135.98        134050.07     Florida  105733.54\n",
       "27   72107.60       127864.55        353183.81    New York  105008.31\n",
       "28   66051.52       182645.56        118148.20     Florida  103282.38\n",
       "29   65605.48       153032.06        107138.38    New York  101004.64\n",
       "30   61994.48       115641.28         91131.24     Florida   99937.59\n",
       "31   61136.38       152701.92         88218.23    New York   97483.56\n",
       "32   63408.86       129219.61         46085.25  California   97427.84\n",
       "33   55493.95       103057.49        214634.81     Florida   96778.92\n",
       "34   46426.07       157693.92        210797.67  California   96712.80\n",
       "35   46014.02        85047.44        205517.64    New York   96479.51\n",
       "36   28663.76       127056.21        201126.82     Florida   90708.19\n",
       "37   44069.95        51283.14        197029.42  California   89949.14\n",
       "38   20229.59        65947.93        185265.10    New York   81229.06\n",
       "39   38558.51        82982.09        174999.30  California   81005.76\n",
       "40   28754.33       118546.05        172795.67  California   78239.91\n",
       "41   27892.92        84710.77        164470.71     Florida   77798.83\n",
       "42   23640.93        96189.63        148001.11  California   71498.49\n",
       "43   15505.73       127382.30         35534.17    New York   69758.98\n",
       "44   22177.74       154806.14         28334.72  California   65200.33\n",
       "45    1000.23       124153.04          1903.93    New York   64926.08\n",
       "46    1315.46       115816.21        297114.46     Florida   49490.75\n",
       "47       0.00       135426.92             0.00  California   42559.73\n",
       "48     542.05        51743.15             0.00    New York   35673.41\n",
       "49       0.00       116983.80         45173.06  California   14681.40"
      ]
     },
     "execution_count": 131,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "dataset"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "2f4df974",
   "metadata": {},
   "source": [
    "R&D Spend-研发费用\n",
    "\n",
    "Administration-管理费用\n",
    "\n",
    "Marketing Spend\t-市场营销支出\n",
    "\n",
    "State-州\n",
    "\n",
    "Profit-盈利"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ef8f472f",
   "metadata": {},
   "source": [
    "### 4、数据处理"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 132,
   "id": "2514d24d",
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "#划分特征集和响应集\n",
    "x = dataset.drop(\"Profit\",axis=1)# inplace: 默认'bool' = False，若为true，则原数据集dataset也会跟着drop\n",
    "#或者通过iloc选取：X = dataset.iloc[:, 0:4]\n",
    "y = dataset[\"Profit\"]\n",
    "#y = dataset.iloc[:, 4]#第四列元素"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 133,
   "id": "3c546f77",
   "metadata": {
    "scrolled": true
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>R&amp;D Spend</th>\n",
       "      <th>Administration</th>\n",
       "      <th>Marketing Spend</th>\n",
       "      <th>State</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>165349.20</td>\n",
       "      <td>136897.80</td>\n",
       "      <td>471784.10</td>\n",
       "      <td>New York</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>162597.70</td>\n",
       "      <td>151377.59</td>\n",
       "      <td>443898.53</td>\n",
       "      <td>California</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>153441.51</td>\n",
       "      <td>101145.55</td>\n",
       "      <td>407934.54</td>\n",
       "      <td>Florida</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>144372.41</td>\n",
       "      <td>118671.85</td>\n",
       "      <td>383199.62</td>\n",
       "      <td>New York</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>142107.34</td>\n",
       "      <td>91391.77</td>\n",
       "      <td>366168.42</td>\n",
       "      <td>Florida</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>131876.90</td>\n",
       "      <td>99814.71</td>\n",
       "      <td>362861.36</td>\n",
       "      <td>New York</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>134615.46</td>\n",
       "      <td>147198.87</td>\n",
       "      <td>127716.82</td>\n",
       "      <td>California</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>130298.13</td>\n",
       "      <td>145530.06</td>\n",
       "      <td>323876.68</td>\n",
       "      <td>Florida</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>120542.52</td>\n",
       "      <td>148718.95</td>\n",
       "      <td>311613.29</td>\n",
       "      <td>New York</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>9</th>\n",
       "      <td>123334.88</td>\n",
       "      <td>108679.17</td>\n",
       "      <td>304981.62</td>\n",
       "      <td>California</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>10</th>\n",
       "      <td>101913.08</td>\n",
       "      <td>110594.11</td>\n",
       "      <td>229160.95</td>\n",
       "      <td>Florida</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>11</th>\n",
       "      <td>100671.96</td>\n",
       "      <td>91790.61</td>\n",
       "      <td>249744.55</td>\n",
       "      <td>California</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>12</th>\n",
       "      <td>93863.75</td>\n",
       "      <td>127320.38</td>\n",
       "      <td>249839.44</td>\n",
       "      <td>Florida</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>13</th>\n",
       "      <td>91992.39</td>\n",
       "      <td>135495.07</td>\n",
       "      <td>252664.93</td>\n",
       "      <td>California</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>14</th>\n",
       "      <td>119943.24</td>\n",
       "      <td>156547.42</td>\n",
       "      <td>256512.92</td>\n",
       "      <td>Florida</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>15</th>\n",
       "      <td>114523.61</td>\n",
       "      <td>122616.84</td>\n",
       "      <td>261776.23</td>\n",
       "      <td>New York</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>16</th>\n",
       "      <td>78013.11</td>\n",
       "      <td>121597.55</td>\n",
       "      <td>264346.06</td>\n",
       "      <td>California</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>17</th>\n",
       "      <td>94657.16</td>\n",
       "      <td>145077.58</td>\n",
       "      <td>282574.31</td>\n",
       "      <td>New York</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>18</th>\n",
       "      <td>91749.16</td>\n",
       "      <td>114175.79</td>\n",
       "      <td>294919.57</td>\n",
       "      <td>Florida</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>19</th>\n",
       "      <td>86419.70</td>\n",
       "      <td>153514.11</td>\n",
       "      <td>0.00</td>\n",
       "      <td>New York</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>20</th>\n",
       "      <td>76253.86</td>\n",
       "      <td>113867.30</td>\n",
       "      <td>298664.47</td>\n",
       "      <td>California</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>21</th>\n",
       "      <td>78389.47</td>\n",
       "      <td>153773.43</td>\n",
       "      <td>299737.29</td>\n",
       "      <td>New York</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>22</th>\n",
       "      <td>73994.56</td>\n",
       "      <td>122782.75</td>\n",
       "      <td>303319.26</td>\n",
       "      <td>Florida</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>23</th>\n",
       "      <td>67532.53</td>\n",
       "      <td>105751.03</td>\n",
       "      <td>304768.73</td>\n",
       "      <td>Florida</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>24</th>\n",
       "      <td>77044.01</td>\n",
       "      <td>99281.34</td>\n",
       "      <td>140574.81</td>\n",
       "      <td>New York</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>25</th>\n",
       "      <td>64664.71</td>\n",
       "      <td>139553.16</td>\n",
       "      <td>137962.62</td>\n",
       "      <td>California</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>26</th>\n",
       "      <td>75328.87</td>\n",
       "      <td>144135.98</td>\n",
       "      <td>134050.07</td>\n",
       "      <td>Florida</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>27</th>\n",
       "      <td>72107.60</td>\n",
       "      <td>127864.55</td>\n",
       "      <td>353183.81</td>\n",
       "      <td>New York</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>28</th>\n",
       "      <td>66051.52</td>\n",
       "      <td>182645.56</td>\n",
       "      <td>118148.20</td>\n",
       "      <td>Florida</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>29</th>\n",
       "      <td>65605.48</td>\n",
       "      <td>153032.06</td>\n",
       "      <td>107138.38</td>\n",
       "      <td>New York</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>30</th>\n",
       "      <td>61994.48</td>\n",
       "      <td>115641.28</td>\n",
       "      <td>91131.24</td>\n",
       "      <td>Florida</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>31</th>\n",
       "      <td>61136.38</td>\n",
       "      <td>152701.92</td>\n",
       "      <td>88218.23</td>\n",
       "      <td>New York</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>32</th>\n",
       "      <td>63408.86</td>\n",
       "      <td>129219.61</td>\n",
       "      <td>46085.25</td>\n",
       "      <td>California</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>33</th>\n",
       "      <td>55493.95</td>\n",
       "      <td>103057.49</td>\n",
       "      <td>214634.81</td>\n",
       "      <td>Florida</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>34</th>\n",
       "      <td>46426.07</td>\n",
       "      <td>157693.92</td>\n",
       "      <td>210797.67</td>\n",
       "      <td>California</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>35</th>\n",
       "      <td>46014.02</td>\n",
       "      <td>85047.44</td>\n",
       "      <td>205517.64</td>\n",
       "      <td>New York</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>36</th>\n",
       "      <td>28663.76</td>\n",
       "      <td>127056.21</td>\n",
       "      <td>201126.82</td>\n",
       "      <td>Florida</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>37</th>\n",
       "      <td>44069.95</td>\n",
       "      <td>51283.14</td>\n",
       "      <td>197029.42</td>\n",
       "      <td>California</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>38</th>\n",
       "      <td>20229.59</td>\n",
       "      <td>65947.93</td>\n",
       "      <td>185265.10</td>\n",
       "      <td>New York</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>39</th>\n",
       "      <td>38558.51</td>\n",
       "      <td>82982.09</td>\n",
       "      <td>174999.30</td>\n",
       "      <td>California</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>40</th>\n",
       "      <td>28754.33</td>\n",
       "      <td>118546.05</td>\n",
       "      <td>172795.67</td>\n",
       "      <td>California</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>41</th>\n",
       "      <td>27892.92</td>\n",
       "      <td>84710.77</td>\n",
       "      <td>164470.71</td>\n",
       "      <td>Florida</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>42</th>\n",
       "      <td>23640.93</td>\n",
       "      <td>96189.63</td>\n",
       "      <td>148001.11</td>\n",
       "      <td>California</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>43</th>\n",
       "      <td>15505.73</td>\n",
       "      <td>127382.30</td>\n",
       "      <td>35534.17</td>\n",
       "      <td>New York</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>44</th>\n",
       "      <td>22177.74</td>\n",
       "      <td>154806.14</td>\n",
       "      <td>28334.72</td>\n",
       "      <td>California</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>45</th>\n",
       "      <td>1000.23</td>\n",
       "      <td>124153.04</td>\n",
       "      <td>1903.93</td>\n",
       "      <td>New York</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>46</th>\n",
       "      <td>1315.46</td>\n",
       "      <td>115816.21</td>\n",
       "      <td>297114.46</td>\n",
       "      <td>Florida</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>47</th>\n",
       "      <td>0.00</td>\n",
       "      <td>135426.92</td>\n",
       "      <td>0.00</td>\n",
       "      <td>California</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>48</th>\n",
       "      <td>542.05</td>\n",
       "      <td>51743.15</td>\n",
       "      <td>0.00</td>\n",
       "      <td>New York</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>49</th>\n",
       "      <td>0.00</td>\n",
       "      <td>116983.80</td>\n",
       "      <td>45173.06</td>\n",
       "      <td>California</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "    R&D Spend  Administration  Marketing Spend       State\n",
       "0   165349.20       136897.80        471784.10    New York\n",
       "1   162597.70       151377.59        443898.53  California\n",
       "2   153441.51       101145.55        407934.54     Florida\n",
       "3   144372.41       118671.85        383199.62    New York\n",
       "4   142107.34        91391.77        366168.42     Florida\n",
       "5   131876.90        99814.71        362861.36    New York\n",
       "6   134615.46       147198.87        127716.82  California\n",
       "7   130298.13       145530.06        323876.68     Florida\n",
       "8   120542.52       148718.95        311613.29    New York\n",
       "9   123334.88       108679.17        304981.62  California\n",
       "10  101913.08       110594.11        229160.95     Florida\n",
       "11  100671.96        91790.61        249744.55  California\n",
       "12   93863.75       127320.38        249839.44     Florida\n",
       "13   91992.39       135495.07        252664.93  California\n",
       "14  119943.24       156547.42        256512.92     Florida\n",
       "15  114523.61       122616.84        261776.23    New York\n",
       "16   78013.11       121597.55        264346.06  California\n",
       "17   94657.16       145077.58        282574.31    New York\n",
       "18   91749.16       114175.79        294919.57     Florida\n",
       "19   86419.70       153514.11             0.00    New York\n",
       "20   76253.86       113867.30        298664.47  California\n",
       "21   78389.47       153773.43        299737.29    New York\n",
       "22   73994.56       122782.75        303319.26     Florida\n",
       "23   67532.53       105751.03        304768.73     Florida\n",
       "24   77044.01        99281.34        140574.81    New York\n",
       "25   64664.71       139553.16        137962.62  California\n",
       "26   75328.87       144135.98        134050.07     Florida\n",
       "27   72107.60       127864.55        353183.81    New York\n",
       "28   66051.52       182645.56        118148.20     Florida\n",
       "29   65605.48       153032.06        107138.38    New York\n",
       "30   61994.48       115641.28         91131.24     Florida\n",
       "31   61136.38       152701.92         88218.23    New York\n",
       "32   63408.86       129219.61         46085.25  California\n",
       "33   55493.95       103057.49        214634.81     Florida\n",
       "34   46426.07       157693.92        210797.67  California\n",
       "35   46014.02        85047.44        205517.64    New York\n",
       "36   28663.76       127056.21        201126.82     Florida\n",
       "37   44069.95        51283.14        197029.42  California\n",
       "38   20229.59        65947.93        185265.10    New York\n",
       "39   38558.51        82982.09        174999.30  California\n",
       "40   28754.33       118546.05        172795.67  California\n",
       "41   27892.92        84710.77        164470.71     Florida\n",
       "42   23640.93        96189.63        148001.11  California\n",
       "43   15505.73       127382.30         35534.17    New York\n",
       "44   22177.74       154806.14         28334.72  California\n",
       "45    1000.23       124153.04          1903.93    New York\n",
       "46    1315.46       115816.21        297114.46     Florida\n",
       "47       0.00       135426.92             0.00  California\n",
       "48     542.05        51743.15             0.00    New York\n",
       "49       0.00       116983.80         45173.06  California"
      ]
     },
     "execution_count": 133,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "x"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "b482d503",
   "metadata": {},
   "source": [
    "#### (1)变量类型转换\n",
    "\n",
    "State列不是数字而是字符，需要转变成数字类型，计算机才能进行计算 \n",
    "\n",
    "通过pandas库里的get_dummies函数，将State里面的三种不同变量用虚拟变量0\\1表示。\n",
    "\n",
    "get_dummies 是一个常用于处理分类变量的函数，它能够将分类变量转换为虚拟变量（dummy variables）。在Python中，可以使用pandas库中的 get_dummies 函数来实现这一功能。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 134,
   "id": "092dc3c0",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array(['New York', 'California', 'Florida'], dtype=object)"
      ]
     },
     "execution_count": 134,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "x[\"State\"].unique()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 110,
   "id": "421285f2",
   "metadata": {
    "scrolled": true
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>California</th>\n",
       "      <th>Florida</th>\n",
       "      <th>New York</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>9</th>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>10</th>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>11</th>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>12</th>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>13</th>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>14</th>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>15</th>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>16</th>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>17</th>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>18</th>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>19</th>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>20</th>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>21</th>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>22</th>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>23</th>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>24</th>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>25</th>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>26</th>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>27</th>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>28</th>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>29</th>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>30</th>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>31</th>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>32</th>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>33</th>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>34</th>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>35</th>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>36</th>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>37</th>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>38</th>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>39</th>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>40</th>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>41</th>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>42</th>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>43</th>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>44</th>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>45</th>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>46</th>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>47</th>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>48</th>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>49</th>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "    California  Florida  New York\n",
       "0            0        0         1\n",
       "1            1        0         0\n",
       "2            0        1         0\n",
       "3            0        0         1\n",
       "4            0        1         0\n",
       "5            0        0         1\n",
       "6            1        0         0\n",
       "7            0        1         0\n",
       "8            0        0         1\n",
       "9            1        0         0\n",
       "10           0        1         0\n",
       "11           1        0         0\n",
       "12           0        1         0\n",
       "13           1        0         0\n",
       "14           0        1         0\n",
       "15           0        0         1\n",
       "16           1        0         0\n",
       "17           0        0         1\n",
       "18           0        1         0\n",
       "19           0        0         1\n",
       "20           1        0         0\n",
       "21           0        0         1\n",
       "22           0        1         0\n",
       "23           0        1         0\n",
       "24           0        0         1\n",
       "25           1        0         0\n",
       "26           0        1         0\n",
       "27           0        0         1\n",
       "28           0        1         0\n",
       "29           0        0         1\n",
       "30           0        1         0\n",
       "31           0        0         1\n",
       "32           1        0         0\n",
       "33           0        1         0\n",
       "34           1        0         0\n",
       "35           0        0         1\n",
       "36           0        1         0\n",
       "37           1        0         0\n",
       "38           0        0         1\n",
       "39           1        0         0\n",
       "40           1        0         0\n",
       "41           0        1         0\n",
       "42           1        0         0\n",
       "43           0        0         1\n",
       "44           1        0         0\n",
       "45           0        0         1\n",
       "46           0        1         0\n",
       "47           1        0         0\n",
       "48           0        0         1\n",
       "49           1        0         0"
      ]
     },
     "execution_count": 110,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "pd.get_dummies(x['State'])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 111,
   "id": "70b963dc",
   "metadata": {},
   "outputs": [],
   "source": [
    "#因为三个State是相互互斥的关系，所以用2个bit位就能表示三种关系。\n",
    "statesdump=pd.get_dummies(x['State'],drop_first=True)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 112,
   "id": "3af3af4c",
   "metadata": {
    "scrolled": true
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Florida</th>\n",
       "      <th>New York</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>9</th>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>10</th>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>11</th>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>12</th>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>13</th>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>14</th>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>15</th>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>16</th>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>17</th>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>18</th>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>19</th>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>20</th>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>21</th>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>22</th>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>23</th>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>24</th>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>25</th>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>26</th>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>27</th>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>28</th>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>29</th>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>30</th>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>31</th>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>32</th>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>33</th>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>34</th>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>35</th>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>36</th>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>37</th>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>38</th>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>39</th>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>40</th>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>41</th>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>42</th>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>43</th>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>44</th>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>45</th>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>46</th>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>47</th>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>48</th>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>49</th>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "    Florida  New York\n",
       "0         0         1\n",
       "1         0         0\n",
       "2         1         0\n",
       "3         0         1\n",
       "4         1         0\n",
       "5         0         1\n",
       "6         0         0\n",
       "7         1         0\n",
       "8         0         1\n",
       "9         0         0\n",
       "10        1         0\n",
       "11        0         0\n",
       "12        1         0\n",
       "13        0         0\n",
       "14        1         0\n",
       "15        0         1\n",
       "16        0         0\n",
       "17        0         1\n",
       "18        1         0\n",
       "19        0         1\n",
       "20        0         0\n",
       "21        0         1\n",
       "22        1         0\n",
       "23        1         0\n",
       "24        0         1\n",
       "25        0         0\n",
       "26        1         0\n",
       "27        0         1\n",
       "28        1         0\n",
       "29        0         1\n",
       "30        1         0\n",
       "31        0         1\n",
       "32        0         0\n",
       "33        1         0\n",
       "34        0         0\n",
       "35        0         1\n",
       "36        1         0\n",
       "37        0         0\n",
       "38        0         1\n",
       "39        0         0\n",
       "40        0         0\n",
       "41        1         0\n",
       "42        0         0\n",
       "43        0         1\n",
       "44        0         0\n",
       "45        0         1\n",
       "46        1         0\n",
       "47        0         0\n",
       "48        0         1\n",
       "49        0         0"
      ]
     },
     "execution_count": 112,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "statesdump"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 113,
   "id": "bae09003",
   "metadata": {
    "scrolled": true
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>R&amp;D Spend</th>\n",
       "      <th>Administration</th>\n",
       "      <th>Marketing Spend</th>\n",
       "      <th>Florida</th>\n",
       "      <th>New York</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>165349.20</td>\n",
       "      <td>136897.80</td>\n",
       "      <td>471784.10</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>162597.70</td>\n",
       "      <td>151377.59</td>\n",
       "      <td>443898.53</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>153441.51</td>\n",
       "      <td>101145.55</td>\n",
       "      <td>407934.54</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>144372.41</td>\n",
       "      <td>118671.85</td>\n",
       "      <td>383199.62</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>142107.34</td>\n",
       "      <td>91391.77</td>\n",
       "      <td>366168.42</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>131876.90</td>\n",
       "      <td>99814.71</td>\n",
       "      <td>362861.36</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>134615.46</td>\n",
       "      <td>147198.87</td>\n",
       "      <td>127716.82</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>130298.13</td>\n",
       "      <td>145530.06</td>\n",
       "      <td>323876.68</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>120542.52</td>\n",
       "      <td>148718.95</td>\n",
       "      <td>311613.29</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>9</th>\n",
       "      <td>123334.88</td>\n",
       "      <td>108679.17</td>\n",
       "      <td>304981.62</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>10</th>\n",
       "      <td>101913.08</td>\n",
       "      <td>110594.11</td>\n",
       "      <td>229160.95</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>11</th>\n",
       "      <td>100671.96</td>\n",
       "      <td>91790.61</td>\n",
       "      <td>249744.55</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>12</th>\n",
       "      <td>93863.75</td>\n",
       "      <td>127320.38</td>\n",
       "      <td>249839.44</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>13</th>\n",
       "      <td>91992.39</td>\n",
       "      <td>135495.07</td>\n",
       "      <td>252664.93</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>14</th>\n",
       "      <td>119943.24</td>\n",
       "      <td>156547.42</td>\n",
       "      <td>256512.92</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>15</th>\n",
       "      <td>114523.61</td>\n",
       "      <td>122616.84</td>\n",
       "      <td>261776.23</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>16</th>\n",
       "      <td>78013.11</td>\n",
       "      <td>121597.55</td>\n",
       "      <td>264346.06</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>17</th>\n",
       "      <td>94657.16</td>\n",
       "      <td>145077.58</td>\n",
       "      <td>282574.31</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>18</th>\n",
       "      <td>91749.16</td>\n",
       "      <td>114175.79</td>\n",
       "      <td>294919.57</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>19</th>\n",
       "      <td>86419.70</td>\n",
       "      <td>153514.11</td>\n",
       "      <td>0.00</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>20</th>\n",
       "      <td>76253.86</td>\n",
       "      <td>113867.30</td>\n",
       "      <td>298664.47</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>21</th>\n",
       "      <td>78389.47</td>\n",
       "      <td>153773.43</td>\n",
       "      <td>299737.29</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>22</th>\n",
       "      <td>73994.56</td>\n",
       "      <td>122782.75</td>\n",
       "      <td>303319.26</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>23</th>\n",
       "      <td>67532.53</td>\n",
       "      <td>105751.03</td>\n",
       "      <td>304768.73</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>24</th>\n",
       "      <td>77044.01</td>\n",
       "      <td>99281.34</td>\n",
       "      <td>140574.81</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>25</th>\n",
       "      <td>64664.71</td>\n",
       "      <td>139553.16</td>\n",
       "      <td>137962.62</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>26</th>\n",
       "      <td>75328.87</td>\n",
       "      <td>144135.98</td>\n",
       "      <td>134050.07</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>27</th>\n",
       "      <td>72107.60</td>\n",
       "      <td>127864.55</td>\n",
       "      <td>353183.81</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>28</th>\n",
       "      <td>66051.52</td>\n",
       "      <td>182645.56</td>\n",
       "      <td>118148.20</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>29</th>\n",
       "      <td>65605.48</td>\n",
       "      <td>153032.06</td>\n",
       "      <td>107138.38</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>30</th>\n",
       "      <td>61994.48</td>\n",
       "      <td>115641.28</td>\n",
       "      <td>91131.24</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>31</th>\n",
       "      <td>61136.38</td>\n",
       "      <td>152701.92</td>\n",
       "      <td>88218.23</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>32</th>\n",
       "      <td>63408.86</td>\n",
       "      <td>129219.61</td>\n",
       "      <td>46085.25</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>33</th>\n",
       "      <td>55493.95</td>\n",
       "      <td>103057.49</td>\n",
       "      <td>214634.81</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>34</th>\n",
       "      <td>46426.07</td>\n",
       "      <td>157693.92</td>\n",
       "      <td>210797.67</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>35</th>\n",
       "      <td>46014.02</td>\n",
       "      <td>85047.44</td>\n",
       "      <td>205517.64</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>36</th>\n",
       "      <td>28663.76</td>\n",
       "      <td>127056.21</td>\n",
       "      <td>201126.82</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>37</th>\n",
       "      <td>44069.95</td>\n",
       "      <td>51283.14</td>\n",
       "      <td>197029.42</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>38</th>\n",
       "      <td>20229.59</td>\n",
       "      <td>65947.93</td>\n",
       "      <td>185265.10</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>39</th>\n",
       "      <td>38558.51</td>\n",
       "      <td>82982.09</td>\n",
       "      <td>174999.30</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>40</th>\n",
       "      <td>28754.33</td>\n",
       "      <td>118546.05</td>\n",
       "      <td>172795.67</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>41</th>\n",
       "      <td>27892.92</td>\n",
       "      <td>84710.77</td>\n",
       "      <td>164470.71</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>42</th>\n",
       "      <td>23640.93</td>\n",
       "      <td>96189.63</td>\n",
       "      <td>148001.11</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>43</th>\n",
       "      <td>15505.73</td>\n",
       "      <td>127382.30</td>\n",
       "      <td>35534.17</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>44</th>\n",
       "      <td>22177.74</td>\n",
       "      <td>154806.14</td>\n",
       "      <td>28334.72</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>45</th>\n",
       "      <td>1000.23</td>\n",
       "      <td>124153.04</td>\n",
       "      <td>1903.93</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>46</th>\n",
       "      <td>1315.46</td>\n",
       "      <td>115816.21</td>\n",
       "      <td>297114.46</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>47</th>\n",
       "      <td>0.00</td>\n",
       "      <td>135426.92</td>\n",
       "      <td>0.00</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>48</th>\n",
       "      <td>542.05</td>\n",
       "      <td>51743.15</td>\n",
       "      <td>0.00</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>49</th>\n",
       "      <td>0.00</td>\n",
       "      <td>116983.80</td>\n",
       "      <td>45173.06</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "    R&D Spend  Administration  Marketing Spend  Florida  New York\n",
       "0   165349.20       136897.80        471784.10        0         1\n",
       "1   162597.70       151377.59        443898.53        0         0\n",
       "2   153441.51       101145.55        407934.54        1         0\n",
       "3   144372.41       118671.85        383199.62        0         1\n",
       "4   142107.34        91391.77        366168.42        1         0\n",
       "5   131876.90        99814.71        362861.36        0         1\n",
       "6   134615.46       147198.87        127716.82        0         0\n",
       "7   130298.13       145530.06        323876.68        1         0\n",
       "8   120542.52       148718.95        311613.29        0         1\n",
       "9   123334.88       108679.17        304981.62        0         0\n",
       "10  101913.08       110594.11        229160.95        1         0\n",
       "11  100671.96        91790.61        249744.55        0         0\n",
       "12   93863.75       127320.38        249839.44        1         0\n",
       "13   91992.39       135495.07        252664.93        0         0\n",
       "14  119943.24       156547.42        256512.92        1         0\n",
       "15  114523.61       122616.84        261776.23        0         1\n",
       "16   78013.11       121597.55        264346.06        0         0\n",
       "17   94657.16       145077.58        282574.31        0         1\n",
       "18   91749.16       114175.79        294919.57        1         0\n",
       "19   86419.70       153514.11             0.00        0         1\n",
       "20   76253.86       113867.30        298664.47        0         0\n",
       "21   78389.47       153773.43        299737.29        0         1\n",
       "22   73994.56       122782.75        303319.26        1         0\n",
       "23   67532.53       105751.03        304768.73        1         0\n",
       "24   77044.01        99281.34        140574.81        0         1\n",
       "25   64664.71       139553.16        137962.62        0         0\n",
       "26   75328.87       144135.98        134050.07        1         0\n",
       "27   72107.60       127864.55        353183.81        0         1\n",
       "28   66051.52       182645.56        118148.20        1         0\n",
       "29   65605.48       153032.06        107138.38        0         1\n",
       "30   61994.48       115641.28         91131.24        1         0\n",
       "31   61136.38       152701.92         88218.23        0         1\n",
       "32   63408.86       129219.61         46085.25        0         0\n",
       "33   55493.95       103057.49        214634.81        1         0\n",
       "34   46426.07       157693.92        210797.67        0         0\n",
       "35   46014.02        85047.44        205517.64        0         1\n",
       "36   28663.76       127056.21        201126.82        1         0\n",
       "37   44069.95        51283.14        197029.42        0         0\n",
       "38   20229.59        65947.93        185265.10        0         1\n",
       "39   38558.51        82982.09        174999.30        0         0\n",
       "40   28754.33       118546.05        172795.67        0         0\n",
       "41   27892.92        84710.77        164470.71        1         0\n",
       "42   23640.93        96189.63        148001.11        0         0\n",
       "43   15505.73       127382.30         35534.17        0         1\n",
       "44   22177.74       154806.14         28334.72        0         0\n",
       "45    1000.23       124153.04          1903.93        0         1\n",
       "46    1315.46       115816.21        297114.46        1         0\n",
       "47       0.00       135426.92             0.00        0         0\n",
       "48     542.05        51743.15             0.00        0         1\n",
       "49       0.00       116983.80         45173.06        0         0"
      ]
     },
     "execution_count": 113,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "#将原本特征划分集中的State删去，连接使用虚拟变量替代后后的State变量\n",
    "x=x.drop('State',axis=1)\n",
    "x=pd.concat([x,statesdump],axis=1)\n",
    "x"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 114,
   "id": "351933f2",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "<class 'pandas.core.frame.DataFrame'>\n",
      "RangeIndex: 50 entries, 0 to 49\n",
      "Data columns (total 5 columns):\n",
      " #   Column           Non-Null Count  Dtype  \n",
      "---  ------           --------------  -----  \n",
      " 0   R&D Spend        50 non-null     float64\n",
      " 1   Administration   50 non-null     float64\n",
      " 2   Marketing Spend  50 non-null     float64\n",
      " 3   Florida          50 non-null     uint8  \n",
      " 4   New York         50 non-null     uint8  \n",
      "dtypes: float64(3), uint8(2)\n",
      "memory usage: 1.4 KB\n"
     ]
    }
   ],
   "source": [
    "x.info()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "25be1613",
   "metadata": {},
   "source": [
    "### 5、数据集拆分"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 115,
   "id": "0514e17c",
   "metadata": {},
   "outputs": [],
   "source": [
    "#划分数据集为测试集和训练集\n",
    "from sklearn.model_selection import train_test_split\n",
    " \n",
    "X_train, X_test, y_train, y_test = train_test_split(x, y, test_size = 0.3, random_state = 42)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "32b844e5",
   "metadata": {},
   "source": [
    "### 6、模型建立与训练"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 116,
   "id": "620f35f4",
   "metadata": {},
   "outputs": [],
   "source": [
    "from sklearn.linear_model import LinearRegression\n",
    "regressor = LinearRegression()\n",
    "model=regressor.fit(X_train, y_train)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "3f169674",
   "metadata": {},
   "source": [
    "### 7、模型预测与评估"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 117,
   "id": "0c0d537f",
   "metadata": {},
   "outputs": [],
   "source": [
    "from sklearn.metrics import r2_score\n",
    "y_predict = regressor.predict(X_test)\n",
    "score1=r2_score(y_test,y_predict)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 118,
   "id": "5d8db924",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0.9397108063368201"
      ]
     },
     "execution_count": 118,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "score1#R2分数：它是决定系数，用于衡量模型对数据的拟合程度。它的取值范围在0到1之间，越接近1表示模型对数据的拟合程度越高。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "9851dbbf",
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "778e4c73",
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "f357e29d",
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "id": "b731a1cf",
   "metadata": {},
   "source": [
    "## 应用案例（二）Age、Height、Weight、Bmi数据分析"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 119,
   "id": "8af90f65",
   "metadata": {},
   "outputs": [],
   "source": [
    "#处理数据\n",
    "import numpy as np\n",
    "import pandas as pd\n",
    "#绘制图像\n",
    "import matplotlib.pyplot as plt\n",
    "import seaborn as sns\n",
    "#机器学习模型\n",
    "from sklearn.linear_model import LinearRegression"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 120,
   "id": "6239f9a9",
   "metadata": {},
   "outputs": [],
   "source": [
    "df = pd.read_csv(\"./kaggle-data/bmi.csv\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 121,
   "id": "01c1c682",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Age</th>\n",
       "      <th>Height</th>\n",
       "      <th>Weight</th>\n",
       "      <th>Bmi</th>\n",
       "      <th>BmiClass</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>61</td>\n",
       "      <td>1.85</td>\n",
       "      <td>109.30</td>\n",
       "      <td>31.935720</td>\n",
       "      <td>Obese Class 1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>60</td>\n",
       "      <td>1.71</td>\n",
       "      <td>79.02</td>\n",
       "      <td>27.023700</td>\n",
       "      <td>Overweight</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>60</td>\n",
       "      <td>1.55</td>\n",
       "      <td>74.70</td>\n",
       "      <td>31.092612</td>\n",
       "      <td>Obese Class 1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>60</td>\n",
       "      <td>1.46</td>\n",
       "      <td>35.90</td>\n",
       "      <td>16.841809</td>\n",
       "      <td>Underweight</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>60</td>\n",
       "      <td>1.58</td>\n",
       "      <td>97.10</td>\n",
       "      <td>38.896010</td>\n",
       "      <td>Obese Class 2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>736</th>\n",
       "      <td>34</td>\n",
       "      <td>1.86</td>\n",
       "      <td>95.70</td>\n",
       "      <td>27.662157</td>\n",
       "      <td>Overweight</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>737</th>\n",
       "      <td>44</td>\n",
       "      <td>1.91</td>\n",
       "      <td>106.90</td>\n",
       "      <td>29.302925</td>\n",
       "      <td>Overweight</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>738</th>\n",
       "      <td>25</td>\n",
       "      <td>1.82</td>\n",
       "      <td>88.40</td>\n",
       "      <td>26.687598</td>\n",
       "      <td>Overweight</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>739</th>\n",
       "      <td>35</td>\n",
       "      <td>1.88</td>\n",
       "      <td>98.50</td>\n",
       "      <td>27.868945</td>\n",
       "      <td>Overweight</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>740</th>\n",
       "      <td>45</td>\n",
       "      <td>1.93</td>\n",
       "      <td>109.90</td>\n",
       "      <td>29.504148</td>\n",
       "      <td>Overweight</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>741 rows × 5 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "     Age  Height  Weight        Bmi       BmiClass\n",
       "0     61    1.85  109.30  31.935720  Obese Class 1\n",
       "1     60    1.71   79.02  27.023700     Overweight\n",
       "2     60    1.55   74.70  31.092612  Obese Class 1\n",
       "3     60    1.46   35.90  16.841809    Underweight\n",
       "4     60    1.58   97.10  38.896010  Obese Class 2\n",
       "..   ...     ...     ...        ...            ...\n",
       "736   34    1.86   95.70  27.662157     Overweight\n",
       "737   44    1.91  106.90  29.302925     Overweight\n",
       "738   25    1.82   88.40  26.687598     Overweight\n",
       "739   35    1.88   98.50  27.868945     Overweight\n",
       "740   45    1.93  109.90  29.504148     Overweight\n",
       "\n",
       "[741 rows x 5 columns]"
      ]
     },
     "execution_count": 121,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 122,
   "id": "e75ad858",
   "metadata": {},
   "outputs": [],
   "source": [
    "X = df.iloc[:,:3]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 123,
   "id": "3a694233",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Age</th>\n",
       "      <th>Height</th>\n",
       "      <th>Weight</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>61</td>\n",
       "      <td>1.85</td>\n",
       "      <td>109.30</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>60</td>\n",
       "      <td>1.71</td>\n",
       "      <td>79.02</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>60</td>\n",
       "      <td>1.55</td>\n",
       "      <td>74.70</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>60</td>\n",
       "      <td>1.46</td>\n",
       "      <td>35.90</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>60</td>\n",
       "      <td>1.58</td>\n",
       "      <td>97.10</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>736</th>\n",
       "      <td>34</td>\n",
       "      <td>1.86</td>\n",
       "      <td>95.70</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>737</th>\n",
       "      <td>44</td>\n",
       "      <td>1.91</td>\n",
       "      <td>106.90</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>738</th>\n",
       "      <td>25</td>\n",
       "      <td>1.82</td>\n",
       "      <td>88.40</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>739</th>\n",
       "      <td>35</td>\n",
       "      <td>1.88</td>\n",
       "      <td>98.50</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>740</th>\n",
       "      <td>45</td>\n",
       "      <td>1.93</td>\n",
       "      <td>109.90</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>741 rows × 3 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "     Age  Height  Weight\n",
       "0     61    1.85  109.30\n",
       "1     60    1.71   79.02\n",
       "2     60    1.55   74.70\n",
       "3     60    1.46   35.90\n",
       "4     60    1.58   97.10\n",
       "..   ...     ...     ...\n",
       "736   34    1.86   95.70\n",
       "737   44    1.91  106.90\n",
       "738   25    1.82   88.40\n",
       "739   35    1.88   98.50\n",
       "740   45    1.93  109.90\n",
       "\n",
       "[741 rows x 3 columns]"
      ]
     },
     "execution_count": 123,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "X"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 124,
   "id": "b64cd8dd",
   "metadata": {},
   "outputs": [],
   "source": [
    "y = df.loc[:,\"Bmi\"]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 125,
   "id": "bfec97e6",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0      31.935720\n",
       "1      27.023700\n",
       "2      31.092612\n",
       "3      16.841809\n",
       "4      38.896010\n",
       "         ...    \n",
       "736    27.662157\n",
       "737    29.302925\n",
       "738    26.687598\n",
       "739    27.868945\n",
       "740    29.504148\n",
       "Name: Bmi, Length: 741, dtype: float64"
      ]
     },
     "execution_count": 125,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "y"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 126,
   "id": "ca6440ff",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([126187.39411512,  85788.82259489,  99777.0281516 ,  45706.12238325,\n",
       "       127062.20722787,  51891.83884402, 109114.62977498, 100600.61123707,\n",
       "        97953.99874703, 111730.57706801, 128818.49200667, 174195.35772603,\n",
       "        93736.28538393, 148381.04097174, 172313.87139381])"
      ]
     },
     "execution_count": 126,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "from sklearn.model_selection import train_test_split\n",
    "\n",
    "X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.30, random_state=42)\n",
    "linreg=LinearRegression()#创建LinearRegression类的实例对象\n",
    "linreg.fit(X,y)\n",
    "linreg\n",
    "y_predict"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 127,
   "id": "72a033c6",
   "metadata": {
    "collapsed": true
   },
   "outputs": [
    {
     "ename": "ValueError",
     "evalue": "Found input variables with inconsistent numbers of samples: [223, 15]",
     "output_type": "error",
     "traceback": [
      "\u001b[1;31m---------------------------------------------------------------------------\u001b[0m",
      "\u001b[1;31mValueError\u001b[0m                                Traceback (most recent call last)",
      "\u001b[1;32m~\\AppData\\Local\\Temp/ipykernel_9200/3026027948.py\u001b[0m in \u001b[0;36m<module>\u001b[1;34m\u001b[0m\n\u001b[0;32m      2\u001b[0m \u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m      3\u001b[0m \u001b[1;31m# 评估指标\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[1;32m----> 4\u001b[1;33m \u001b[0mmse\u001b[0m \u001b[1;33m=\u001b[0m \u001b[0mmean_squared_error\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0my_test\u001b[0m\u001b[1;33m,\u001b[0m \u001b[0my_predict\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0m\u001b[0;32m      5\u001b[0m \u001b[0mrmse\u001b[0m \u001b[1;33m=\u001b[0m \u001b[0mnp\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0msqrt\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0mmse\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m      6\u001b[0m \u001b[0mmae\u001b[0m \u001b[1;33m=\u001b[0m \u001b[0mmean_absolute_error\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0my_test\u001b[0m\u001b[1;33m,\u001b[0m \u001b[0my_predict\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n",
      "\u001b[1;32mC:\\ProgramData\\Anaconda3\\lib\\site-packages\\sklearn\\utils\\validation.py\u001b[0m in \u001b[0;36minner_f\u001b[1;34m(*args, **kwargs)\u001b[0m\n\u001b[0;32m     61\u001b[0m             \u001b[0mextra_args\u001b[0m \u001b[1;33m=\u001b[0m \u001b[0mlen\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0margs\u001b[0m\u001b[1;33m)\u001b[0m \u001b[1;33m-\u001b[0m \u001b[0mlen\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0mall_args\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m     62\u001b[0m             \u001b[1;32mif\u001b[0m \u001b[0mextra_args\u001b[0m \u001b[1;33m<=\u001b[0m \u001b[1;36m0\u001b[0m\u001b[1;33m:\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[1;32m---> 63\u001b[1;33m                 \u001b[1;32mreturn\u001b[0m \u001b[0mf\u001b[0m\u001b[1;33m(\u001b[0m\u001b[1;33m*\u001b[0m\u001b[0margs\u001b[0m\u001b[1;33m,\u001b[0m \u001b[1;33m**\u001b[0m\u001b[0mkwargs\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0m\u001b[0;32m     64\u001b[0m \u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m     65\u001b[0m             \u001b[1;31m# extra_args > 0\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n",
      "\u001b[1;32mC:\\ProgramData\\Anaconda3\\lib\\site-packages\\sklearn\\metrics\\_regression.py\u001b[0m in \u001b[0;36mmean_squared_error\u001b[1;34m(y_true, y_pred, sample_weight, multioutput, squared)\u001b[0m\n\u001b[0;32m    333\u001b[0m     \u001b[1;36m0.825\u001b[0m\u001b[1;33m...\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m    334\u001b[0m     \"\"\"\n\u001b[1;32m--> 335\u001b[1;33m     y_type, y_true, y_pred, multioutput = _check_reg_targets(\n\u001b[0m\u001b[0;32m    336\u001b[0m         y_true, y_pred, multioutput)\n\u001b[0;32m    337\u001b[0m     \u001b[0mcheck_consistent_length\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0my_true\u001b[0m\u001b[1;33m,\u001b[0m \u001b[0my_pred\u001b[0m\u001b[1;33m,\u001b[0m \u001b[0msample_weight\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n",
      "\u001b[1;32mC:\\ProgramData\\Anaconda3\\lib\\site-packages\\sklearn\\metrics\\_regression.py\u001b[0m in \u001b[0;36m_check_reg_targets\u001b[1;34m(y_true, y_pred, multioutput, dtype)\u001b[0m\n\u001b[0;32m     86\u001b[0m         \u001b[0mthe\u001b[0m \u001b[0mdtype\u001b[0m \u001b[0margument\u001b[0m \u001b[0mpassed\u001b[0m \u001b[0mto\u001b[0m \u001b[0mcheck_array\u001b[0m\u001b[1;33m.\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m     87\u001b[0m     \"\"\"\n\u001b[1;32m---> 88\u001b[1;33m     \u001b[0mcheck_consistent_length\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0my_true\u001b[0m\u001b[1;33m,\u001b[0m \u001b[0my_pred\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0m\u001b[0;32m     89\u001b[0m     \u001b[0my_true\u001b[0m \u001b[1;33m=\u001b[0m \u001b[0mcheck_array\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0my_true\u001b[0m\u001b[1;33m,\u001b[0m \u001b[0mensure_2d\u001b[0m\u001b[1;33m=\u001b[0m\u001b[1;32mFalse\u001b[0m\u001b[1;33m,\u001b[0m \u001b[0mdtype\u001b[0m\u001b[1;33m=\u001b[0m\u001b[0mdtype\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m     90\u001b[0m     \u001b[0my_pred\u001b[0m \u001b[1;33m=\u001b[0m \u001b[0mcheck_array\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0my_pred\u001b[0m\u001b[1;33m,\u001b[0m \u001b[0mensure_2d\u001b[0m\u001b[1;33m=\u001b[0m\u001b[1;32mFalse\u001b[0m\u001b[1;33m,\u001b[0m \u001b[0mdtype\u001b[0m\u001b[1;33m=\u001b[0m\u001b[0mdtype\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n",
      "\u001b[1;32mC:\\ProgramData\\Anaconda3\\lib\\site-packages\\sklearn\\utils\\validation.py\u001b[0m in \u001b[0;36mcheck_consistent_length\u001b[1;34m(*arrays)\u001b[0m\n\u001b[0;32m    317\u001b[0m     \u001b[0muniques\u001b[0m \u001b[1;33m=\u001b[0m \u001b[0mnp\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0munique\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0mlengths\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m    318\u001b[0m     \u001b[1;32mif\u001b[0m \u001b[0mlen\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0muniques\u001b[0m\u001b[1;33m)\u001b[0m \u001b[1;33m>\u001b[0m \u001b[1;36m1\u001b[0m\u001b[1;33m:\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[1;32m--> 319\u001b[1;33m         raise ValueError(\"Found input variables with inconsistent numbers of\"\n\u001b[0m\u001b[0;32m    320\u001b[0m                          \" samples: %r\" % [int(l) for l in lengths])\n\u001b[0;32m    321\u001b[0m \u001b[1;33m\u001b[0m\u001b[0m\n",
      "\u001b[1;31mValueError\u001b[0m: Found input variables with inconsistent numbers of samples: [223, 15]"
     ]
    }
   ],
   "source": [
    "from sklearn.metrics import mean_squared_error, mean_absolute_error, r2_score\n",
    "\n",
    "# 评估指标\n",
    "mse = mean_squared_error(y_test, y_predict)\n",
    "rmse = np.sqrt(mse)\n",
    "mae = mean_absolute_error(y_test, y_predict)\n",
    "r2 = r2_score(y_test, y_predict)\n",
    "\n",
    "print(\"均方误差（MSE）：\", mse)\n",
    "print(\"均方根误差（RMSE）：\", rmse)\n",
    "print(\"平均绝对误差（MAE）：\", mae)\n",
    "print(\"R2分数：\", r2)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 128,
   "id": "b04a5957",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[23.1764776]\n",
      "23.37472607742878\n"
     ]
    }
   ],
   "source": [
    "predict_val = linreg.predict([[30,1.85,80]])\n",
    "print(predict_val)\n",
    "print(80/1.85**2)#BMI"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "01e1aae8",
   "metadata": {},
   "source": [
    "BMI = 体重（千克）÷ 身高（米）的平方"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "9ad289eb",
   "metadata": {},
   "source": [
    "根据你提供的模型评价指标，可以看出该模型的预测准确性较高。具体来说：\n",
    "\n",
    "1. 均方误差（MSE）为2.3184863195143905，这个值较小，说明模型预测值与真实值之间的差距相对较小。\n",
    "\n",
    "2. 均方根误差（RMSE）为1.522657650134918，也是较小的值，反映了模型预测的平均误差大小。\n",
    "\n",
    "3. 平均绝对误差（MAE）为0.7340409980692838，同样是一个较小的值，也说明了模型预测的准确性较高。\n",
    "\n",
    "4. R2分数为0.9697095054454898，接近于1，说明模型可以解释因变量变化的能力较强。\n",
    "\n",
    "综上所述，该模型的预测准确性较高，并且能够较好地解释因变量的变化。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "6c753906",
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.9.7"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}
